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Motivational salience plays an important role in shaping human behavior, but recent 
studies demonstrate that human performance is not uniformly improved by motivation. 
Instead, action has been shown to dominate valence in motivated tasks, and it is 
particularly difficult for humans to learn the inhibition of an action to obtain a reward, 
but the neural mechanism behind this behavioral specificity is yet unclear. In all mammals, 
including humans, the monoamine neurotransmitter dopamine is particularly important 
in the neural manifestation of appetitively motivated behavior, and the human dopamine 
system is subject to considerable genetic variability. The well-studied TaqlA restriction 
fragment length polymorphism (rs1 800497) has previously been shown to affect striatal 
dopamine metabolism. In this study we investigated a potential effect of this genetic 
variation on motivated action/inhibition learning. Two independent cohorts consisting of 
87 and 95 healthy participants, respectively, were tested using the previously described 
valenced go/no-go learning paradigm in which participants learned the reward-associated 
no-go condition significantly worse than all other conditions. This effect was modulated 
by the TaqlA polymorphism, with carriers of the A1 allele showing a diminished 
learning-related performance enhancement in the rewarded no-go condition compared to 
the A2 homozygotes. This result highlights a modulatory role for genetic variability of the 
dopaminergic system in individual learning differences of action-valence interaction. 
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INTRODUCTION 

Efficient decision making requires an individual to select 
responses that maximize reward and minimize punishment or 
loss. Such motivated behavior involves two fundamental axes of 
control, namely valence — spanning reward and punishment, and 
action — spanning invigoration and inhibition. Previous studies 
have shown that these two axes are not independent (Guitart- 
Masip et al., 2012b, 2013; Cavanagh et al, 2013; Chowdhury et al., 
2013; for review see Guitart-Masip et al., 2014) and that deci- 
sion making is not only influenced by an instrumental controller 
that learns to optimize choices on the basis of their contingent 
consequences, but also on a Pavlovian controller that generates 
stereotyped, "hard-wired" behavioral responses to the occurrence 
of motivationally salient outcomes or learned predictions of such 



outcomes (Dickinson and Balleine, 2002; Guitart-Masip et al., 
2013). The presence of such "hard-wired" response patterns may 
be an evolutionarily beneficial adaptation to an environment 
world in which obtaining a reward typically requires some sort 
of overt behavioral response (go to win) whereas avoiding a pun- 
ishment rather requires an avoidance of those actions that may 
lead to it (no-go to avoid losing). On the other hand, such a 
response bias may also be a source of suboptimal behavior when 
Pavlovian and instrumental controllers are in opposition (Breland 
and Breland, 1961; Dayan et al., 2006; Boureau and Dayan, 2011). 

In order to manipulate action and valence orthogonally, 
Guitart-Masip et al. (2012b) designed a go/no-go learning task 
that involves besides the commonly investigated conditions go to 
win and no-go to avoid losing also the vice versa conditions where 
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the participant needs to perform an action to avoid a punishment 
(go to avoid losing) or to inhibit an action to obtain a reward (no- 
go to win). Studies employing this task have repeatedly shown that 
while active choices in rewarded conditions and passive choices in 
punished conditions can be learned easily, it is significantly harder 
to learn an approach behavior to avoid a punishment and yet even 
more difficult to inhibit an action to obtain a reward. This asym- 
metry indicates that signals that predict reward are prepotently 
associated with behavioral activation, whereas signals that predict 
punishment are intrinsically coupled to behavioral inhibition. 

In search for neural mechanisms underlying this behav- 
ioral asymmetry in the coupling between action and valence, 
monoaminergic, particularly dopaminergic, neuromodulation is 
a prime candidate (Gray and McNaughton, 2000; Boureau and 
Dayan, 2011; Cools et al., 2011). Dopamine (DA) is believed to 
enable or enhance the generation of active motivated behavior 
(Berridge and Robinson, 1998; Niv et al., 2007; Salamone et al., 
2007; Beierholm et al., 2013) and to support instrumental learn- 
ing (Frank et al, 2004; Daw and Doya, 2006; Wickens et al., 2007). 
It has been observed that DA depletion leads to decreased motor 
activity and decreased motivated behavior (Ungerstedt, 1971; 
Palmiter, 2008), along with decreased vigor or motivation to work 
for rewards in demanding reinforcement schedules (Salamone 
et al., 2005; Niv et al., 2007). Conversely, boosting DA levels 
with levodopa invigorates motor responses in healthy humans 
(Guitart-Masip et al., 2012a) and DA promotes "go" and impairs 
"no-go" learning, for example in patients with Parkinson's disease 
(Frank et al., 2004). However, contrary to the expectations sug- 
gested by this evidence, administration of levodopa reduced the 
learning disadvantage of the no-go to win condition when com- 
pared to the no-go to avoid losing (Guitart-Masip et al., 2013). 
These effects suggested that DA is involved in decreasing the cou- 
pling between action and valence, supposedly via DA's actions 
on neural functions implemented in prefrontal cortex (Hitchcott 
et al., 2007). It is therefore unclear how striatal DA modulates the 
coupling between action and valence uncovered in this task. 

The aim of the present study was to test whether natu- 
rally occurring differences in healthy humans in this valenced 
action/inhibition learning might arise from dopaminergic mech- 
anisms and how striatal DA effects the action/valence interaction. 
To address this issue, we used the valenced go/no-go learning 
paradigm in a cohort of young, healthy subjects, and tested them 
for the TaqlA restriction length polymorphism (rsl800497), a 
common genetic variation of the dopamine D2 receptor (DRD2) 
gene known to affect D2 receptor expression and striatal DA 
metabolism. Although the underlying molecular mechanisms are 
yet not fully understood, the TaqlA polymorphism has been 
repeatedly associated with reduced striatal DRD2 density in Al 
carriers as evident from three post mortem studies (Noble et al., 
1991; Thompson et al., 1997; Ritchie and Noble, 2003) and two 
out of three conducted in vivo binding studies (Laruelle et al., 
1998; Pohjalainen et al, 1998; Jonsson et al, 1999). Laakso et al. 
(2005) suggested that the lower D2 receptor expression leads 
to decreased autoreceptor function, thereby increasing the DA 
and/or trace amine synthesis rate in the brains of Al allele carri- 
ers. Moreover, Kirsch et al. (2006) observed an increase of striatal 
BOLD signal in response to the dopamine D2 receptor agonist 



bromocriptine in subjects carrying the Al allele, but not in sub- 
jects without the Al allele, and Stelzel et al. (2010) reported a 
generally increased striatal BOLD signal in Al carriers. As stri- 
atal BOLD signal has been shown to correlate with DA release 
(Schott et al, 2008), the increased striatal activation in Al carri- 
ers might be related to higher presynaptic dopaminergic activity 
(Richter et al., 2013). Because striatal DA is associated with link- 
ing action with reward (Berridge and Robinson, 1998; Frank et al., 
2004; Daw and Doya, 2006; Niv et al, 2007; Salamone et al, 2007; 
Wickens et al., 2007; Beierholm et al., 2013), we hypothesized that 
Al carriers might show increased coupling between action and 
valence. 

MATERIALS AND METHODS 
PARTICIPANTS 

Participants were recruited from a cohort of 719 young healthy 
volunteers of Caucasian ethnicity of a large-scale behavioral 
genetic study conducted at the Leibniz Institute for Neurobiology, 
Magdeburg. Given our hypothesis regarding differential perfor- 
mance in the valenced go/no-go task as a function of striatal 
D2 receptor availability, we selected participants a priori as a 
function of DRD2 TaqlA genotype. To control for confounding 
effects of genetic influences on prefrontal DA availability, we also 
ensured a balanced distribution of the COMT Vall08/158 Met 
polymorphism that is known to affect prefrontal DA levels and 
Dl receptor binding (Gogos et al., 1998; Matsumoto et al., 2003; 
Meyer-Lindenberg et al, 2005; Slifstein et al, 2008). All partici- 
pants were right-handed according to self-report, not genetically 
related, and had obtained at least a university entrance diploma 
(Abitur) as educational certificate. Importantly, all participants 
had undergone routine clinical interview to exclude present or 
past neurological or psychiatric illness, alcohol, or drug abuse, 
use of centrally acting medication, the presence of psychosis or 
bipolar disorder in a first-degree relative, and additionally, given 
the design of the experiment, regular gambling. Two indepen- 
dent cohorts of healthy participants were tested (cohort 1: 43 
females and 44 males; age: range 19-36 years, mean 24.6 years, 
SD = 3.1 years; cohort 2: 48 females and 47 males; age: range 
20-33 years, mean 24.6 years, SD = 2.8 years). Because of a previ- 
ously reported potential association of the Al allele with nicotine 
consumption (Verde et al., 2011; for reviews see Comings and 
Blum, 2000; Lerman et al., 2007), smoking status was assessed 
from the participants. All participants gave written informed con- 
sent in accordance with the Declaration of Helsinki and received 
financial compensation for participation. The work was approved 
by the Ethics Committee of the University of Magdeburg, Faculty 
of Medicine. 

GEN0TYPING 

The DRD2/ANKK1 TaqlA restriction length polymorphism 
(NCBI accession number: rsl800497) was genotyped using a pro- 
tocol previously described in Richter et al. (2013). Genomic DNA 
was extracted from blood leukocytes using the GeneMole® auto- 
mated system (Mole Genetics AS, Lysaker, Norway) according to 
the manufacturer's protocol. Genotyping was performed using 
PCR followed by allele-specific restriction analysis using previ- 
ously described primers (Grandy et al, 1989). Genotyping was 
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also performed for several additional polymorphisms, including 
COMT Vall08/158 Met (see Table 1), to control for confound- 
ing effects of other genetic variants and to reduce the risk of 
population stratification. 

PARADIGM 

We used a previously employed go/no-go learning task with 
orthogonalized action requirements and outcome valence 
(Guitart-Masip et al, 2012b, 2013; Chowdhury et al., 2013). The 
trial timing is displayed in Figure 1. Each trial consisted of presen- 
tation of a fractal cue, a target detection task, and a probabilistic 
outcome. First, one out of four abstract fractal cues was displayed 
for 1000 ms. Participants were informed that a fractal indicated 
whether they would subsequently be required to perform a tar- 
get detection task by pressing a button (go) or not (no-go) and 
that the cue also indicated the possible valence of the outcome 
of the subjects' behavior (reward/no reward or punishment/no 
punishment). However, subjects were not instructed about the 
contingencies for each fractal image and had to learn them by 
trial and error. The meaning of the fractal images was randomized 
across participants. Following a variable interval (250-3500 ms) 
after offset of the fractal image, the target detection task started: 
participants had the opportunity to press a button within a time 
limit of 2000 ms to indicate the side of a circle for go trials, or not 
to press for no-go trials. After the offset of the circle after 1500 
and 1000 ms of fixation, subjects were presented with the out- 
come. The outcome remained on screen for 2000 ms and after a 
variable intertrial interval (ITI; 750-1500 ms) a new trial started. 
Participants were informed that the outcome was probabilistic: in 



win trials 80% of correct choices and 20% of incorrect choices 
were rewarded with 0.50 € (the remaining 20% of correct and 
80% of incorrect choices leading to no outcome), while in avoid 
losing trials 80% of correct choices and 20% of incorrect choices 
avoided a loss of 0.50 € (the remaining 20% of correct and 80% of 
incorrect choices leading to a punishment). Thus, there were four 
trial types depending on the nature of the fractal cue presented 
at the beginning of the trial: press the correct button in the target 
detection task to gain a reward (go to win); press the correct but- 
ton in the target detection task to avoid punishment (go to avoid 
losing); do not press a button in the target detection task to gain 
a reward (no-go to win); do not press a button in the target detec- 
tion task to avoid punishment (no-go to avoid losing). The task 
included 240 trials, 60 trials per condition and was divided into 
four sessions 9min each (15 trials per condition in randomized 
order). Subjects were told that they would be paid their earnings 
of the task up to a total of 25 € and a minimum of 7 €. Before 
starting with the learning task, subjects performed 10 trials of the 
target detection task in order to get familiarized with the speed 
requirements. 

STATISTICAL ANALYSIS 

The percentage of correct choices in the target detection task 
(correct button press for go conditions and correct omission of 
responses in no-go trials) was collapsed across time bins of 30 
trials per condition and analyzed with a mixed ANOVA with 
time (lst/2nd half), action (go/no-go), and valence (win/lose) 
as within-subject factors and TaqlA genotype (A1+/A1— ) as 
between-subject factor. Additionally reaction times of correct 



Table 1 | Genotyped polymorphisms. 
Polymorphism/Gene 



NCBI accession number 



Genotyping protocol 



DRD2/ANKK1 TaqlA 



rs1 800497 



Richter et al., 2013 
Primers for PCR: 

5'-CCGTCGACGGCTGGCCAAGTTGTCTA-3' 
5'-CCGTCGACCCTTCCTGAGTGTCATCA-3' 
Restriction enzyme: Taql 



COMT Val108/158 Met 



DAT1 VNTR 



rs4680 



rs28363170 



Schott et al., 2006; Wimber et al., 2011 
Primers for PCR: 

5'-ATGGCCCGCCTGCTGTCACCAG-3' 
5'-TCTGACAACGGGTCAGGCACGCACAC-3' 
Restriction enzyme: Hinlll (Nlalll) 

Schott et al., 2006 
Primers for PCR: 

5'-TGTGGTGTAGGAAACGGCCTGAG-3' 
5'-CTTCCTG GAG GTCACGG CTCAAAGG-3' 
PCR products were not digested 



DRD2 C957T 



rs6277 



Kompetitive allele-specific PCR (KASP) 
Assay on Demand (LGC Genomics, Berlin, 
Germany) 



DARPP-32 



rs907094 



Primers for PCR: 

5'-GCACCCCATGGAGCGAGAAGACAG-3' 
5'-CGCATTGCTGAGTCTCACCTGCAGTC-3' 
Restriction enzyme: Trull 
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1 500ms 
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go 
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go 
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20% f 
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80% f 
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80% J. 


80% 
20% J, 



FIGURE 1 | Experimental paradigm of the probabilistic monetary go/no-go 
task. Fractal images indicate the combination between action (go or no-go) and 
vaience (reward or loss). On go trials, subjects press a button for the side of a 
circle. On no-go trials they withhold a response. Arrows indicate rewards (green) 
or losses (red). Horizontal bars (yellow) symbolize the absence of a win or a loss. 



The schematics at the bottom represent for each trial type the nomenclature 
(left), the possible outcomes and their probabilities after response to the target 
("go"; middle), and the possible outcomes and their probability after 
withholding a response to the target ("no-go"; right), gw, go to win; gal, go to 
avoid losing; ngw, no-go to win; ngal, no-go to avoid losing; ITI, intertrial interval. 



go responses (RTs) were analyzed using a mixed ANOVA with 
valence (win/lose) and TaqlA genotype (A1+/A1— ) as factors. 
When appropriate, paired f-test, independent sample f-test or 
Mann- Whitney [/-test were used as post-hoc tests. 

The analysis of the behavioral data was done in two stages. 
In cohort 1 we included the TaqlA and the COMT Vall08/158 
Met polymorphism as between-subject factors. In the second we 
specifically aimed to replicate the significant effect of TaqlA. The 
following statistics include TaqlA as the only between-subject 
factor. 

RESULTS 
GEN0TYPING 

Genotyping was performed in the entire cohort of 719 subjects, 
and two sub-cohorts were recruited based on the DRD2/ANKK1 
TaqlA genotype. The data of 87 participants in cohort 1 and 95 
participants in cohort 2 were analyzed. In cohort 1, we identified 
4 Al homozygotes, 33 heterozygotes and 50 A2 homozygotes. In 
cohort 2, genotyping revealed 4 Al homozygotes, 30 heterozy- 
gotes and 61 A2 homozygotes. The distributions in both groups 



were at Hardy- Weinberg equilibrium (cohort 1: X = 0.24, 
p = 0.621; cohort 2: x. 2 = 0.02, p = 0.898). Al carriers (A1+: 
Al/Al and A1/A2) were grouped together for all subsequent anal- 
yses as in previous behavioral and imaging studies of the TaqlA 
polymorphism (Stelzel et al, 2010; Richter et al, 2013). The 
groups A1+ and Al— (A2/A2) did not differ in gender, in age 
or in the number of smokers and nonsmokers (Table 2). 

To control for effects of prefrontal DA availability, participants 
were also selected regarding the COMT Vall08/158 Met (NCBI 
accession number: rs4680) polymorphism. Genotyping revealed 
31 Met/Met, 29 Val/Met, and 27 Val/Val carriers in cohort 1 and 
30 Met/Met, 41 Val/Met, and 24 Val/Val carriers in cohort 2. 
Allelic distribution for the COMT Vall08/158 Met polymor- 
phism did not differ significantly for either TaqlA Al carriers or 
A2 homozygotes (Table 2). The experimenters who performed 
the behavioral task were blinded regarding DRD2/ANKK1 and 
COMT genotypes. 

To further control for effects of population stratification and 
potential effects of putatively functional genetic variations in 
the dopamine system, genotyping was also performed for the 
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Table 2 | Demographic data. 



A1 + 



A1- 



COHORT 1 

Women/Men (n = 87) 
Mean age (n = 87) 
Smokers/Nonsmokers (n= 87) 
COMT mm/vm/vv (n = 87) 
DAT1-VNTR 9+/9- (n = 85) 
C957T CC/CT/TT (n = 87) 
DARPP-32 CC/CT/TT (n = 87) 
COHORT 2 
Women/Men (n = 95) 
Mean age (n = 95) 
Smokers/Nonsmokers (n= 95) 
COMT mm/vm/vv (n = 95) 
DAT1-VNTR 9+/9- (n = 93) 
C957T CC/CT/TT (n = 95) 
DARPP-32 CC/CT/TT (n = 95) 



17/20 
24.9 ±3.6 
15/22 
13/14/10 
11/25 
11/19/7 
20/15/2 

13/21 
25.2 ±3.3 

5/29 
11/14/9 

17/17 
15/17/2 
15/16/3 



26/24 
24.3 ±2.6 

14/36 
18/15/17 

15/34 
8/24/18 
29/19/2 

35/26 
24.2 ±2.4 

14/47 
19/27/15 

32/27 
3/37/21 
41/20/0 



X 2 = 0.31, p = 0.577 
f( 85 ) = 0.83, p = 0.410 
X 2 = 1.51, p = 0.220 
X 2 = 0.73, p = 0.694 
X 2 < 0.01, p = 0.996 
X 2 =4.04, p = 0.132 
X 2 = 0.19, p = 0.912 



X 2 = 3.20, p = 0.074 
f ( 93) = 1.58, p= 0.121 
X 2 = 0.93, p = 0.335 
X 2 = 0.09, p = 0.957 
X 2 

x 2 
x 2 



Gender distribution, age (means ± standard deviations), number of smokers and nonsmokers. Allelic distributions for following polymorphisms: COMT Vall08/158 
Met (mm, met homozygotes; vm, val/met heterozygotes; mm, met homozygotes), DAT1-VNTR (9+, carriers of the 9-repeat allele 9/9 and 9/10; 9—, 10-repeat 
homozygous subjects 10/10), C957T (CC/CT/TT carriers) and DARPP-32 (CC/CT/TT carriers). A1+; carriers oftheAl allele. A1-; A2 homozygotes. 



DAT1-VNTR (NCBI accession number: rs28363170), the C957T 
polymorphism within the DRD2 gene (NCBI accession num- 
ber: rs6277) and the DARPP-32 polymorphism (NCBI acces- 
sion number: rs907094) (see Table 1). Allelic distributions for 
the DAT1-VNTR polymorphism did not differ significantly for 
either TaqlA Al carriers or A2 homozygotes (Table 2). However, 
because of differences for the C957T and the DARPP-32 polymor- 
phism, we additionally calculated an ANCOVA including these 
two polymorphisms as covariates (see below). 

BEHAVIORAL RESULTS 

We initially performed an omnibus mixed-design ANOVA to test 
for effects of both DRD2/ANKK1 and COMT genotypes. There 
was a significant four-fold interaction of DRD2/ANKK1 TaqlA 
with action, time and valence [F(i,si) = 5.11, p = 0.027], but no 
effect of COMT Vall08/158 Met polymorphism (all p > 0.120). 
All further analyses were therefore focused on the DRD2/ANKK1 
TaqlA polymorphism. We computed as ANOVA for repeated 
measures on the percentage of correct (optimal) choices with 
action (go/no-go), valence (win/lose) and time (lst/2nd half) 
as within-subject factors and genotype (A1+/A1— ) as between- 
subject factor. See Table 3 for statistics. 

Our study reproduced a main effect of action [cohort 1: 
F(i, 85) = 62.56, p < 0.001; cohort 2: F (h 93) = 50.87,p < 0.001] 
and an action by valence interaction [cohort 1: F(i ; 85) = 44.41, 
p < 0.001; cohort 2: F u , 93) = 37.72, p < 0.001], as demon- 
strated in previous studies (Guitart-Masip et al., 2012b, 2013; 
Cavanagh et al, 2013; Chowdhury et al., 2013). Subjects showed 
better performance in conditions requiring a go choice than in 
trials requiring a no-go choice [cohort 1: f(86) = 7.97, p < 0.001; 
cohort 2: f( 94 ) = 7.68, p < 0.001], and while they were better 
at learning from reward as compared to punishment in the go 
condition [cohort 1: ti^) = 6.28, p < 0.001; cohort 2: t(9t) = 



5.74, p < 0.001], this relation reversed in the no-go condition 
[cohort 1: f( 86 ) = 4.99, p < 0.001; cohort 2: f( 94 ) = 4.63, p < 
0.001]. As Guitart-Masip et al. (2012b, 2013) we also observed a 
main effect of time [cohort 1: F(i, 85) = 135.92, p < 0.001; cohort 
2: Fq 93) = 189.21, p =< 0.001] and additionally an action by 
time interaction [cohort 1: F(^ 85) = 19.09, p < 0.001; cohort 
2: F(i 93) = 59.77, p < 0.001], indicating a preponderant initial 
bias toward go responses [cohort 1: f(86) = 4.62, p < 0.001; 
cohort 2: f (94 ) = 8.46, p < 0.001]. 

Most interestingly for the current study, we observed a four- 
fold interaction of action by valence by time by genotype 
[cohort 1: F (1 , 85) = 5.24, p = 0.025; cohort 2: Fq, 93) = 4.59, 
p = 0.035]. This effect was observed in the absence of an action 
by valence by genotype effect (cohort 1: p = 0.811; cohort 2: 
p = 0.087). While the genotype groups did not differ signifi- 
cantly in their mean performance in the first and second time 
bin in any condition (cohort 1: p > 0.143; cohort 2: p > 0.167), 
they showed a different degree of improvement from the first 
to the second time interval (learning gain: mean performance 
2nd half — mean performance 1st half; see Figure 2). Performance 
of the A2 homozygotes in the no-go to win condition showed 
increased improvement from the first to the second half of the 
experiment compared to the Al carriers [cohort 1: t(ss) = 2.78, 
p = 0.007] . In the second cohort this result was replicated [cohort 
2: £(93) = 2.16, p = 0.033], and Al carriers showed lower perfor- 
mance in the go to avoid losing condition [cohort 2: ((93) = 2.26, 
p = 0.026] . Because performance in the no-go to win condition 
during early trials differed between the two cohorts, we tested 
whether the observed interaction, which would likely reflect a 
difference in learning rate, remained significant when combin- 
ing both datasets. A Three-Way ANCOVA across both cohorts 
(including cohort as a covariate of no interest; see Figure 2) 
revealed the same three-way interaction revealed by the analyses 
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Table 3 | Statistics on percentage of correct responses. 



Effects 


Cohort 1 


Cohort 2 


Action 


F{i. 85) = 62.56, 

p < 0.001, t) 2 = 0.42 


F n . 93) = 50.87, 

p < 0.001, T) 2 = 0.35 


Go > no-go 


go: = 87± 12% 
no-go: = 73 ± 21% 
((86) = 7.97, p < 0.001 


go: = 91 ± 9% 
no-go: = 79 ± 18% 
f (94 ) = 7.68, p < 0.001 


Time 


F n . 85) = 135.92, 
p < 0.001, t) 2 = 0.62 


F n . 93 ) = 189.21, 
p=< 0.001, T] 2 =0.67 


2nd half > 1st half 


1st half: = 74 ±15% 
2nd half: = 86 ± 16% 
( ( 86) = 11-89, p < 0.001 


1st half: = 78 ± 13% 
2nd half: = 92 ± 13% 
( ( g 4 ) = 14.68, 
p < 0.001 


Action x valence 


F{t, 851=44.41, 

p < 0.001, t) 2 = 0.34 


F (1 , 93, = 37.72, 

p < 0.001, T) 2 = 0.29 


Go to win > go to 
avoid losing 


gw: = 91 ± 14% 
gal: = 82 ± 14% 
((86) = 6.28, p < 0.001 


gw: = 95 ± 12% 
gal: = 87 ± 10% 
f ( 94) = 5.74, p < 0.001 


No-go to avoid losing 
> no-go to win 


ngw: = 66 ± 32% 
ngal: = 81 ± 16% 
((86) = 4.99, p < 0.001 


ngw: = 73 ± 30% 
ngal: = 86 ±11% 
f ( 94) = 4.63, p < 0.001 


Action x time 


F n . 85) = 19-09, 

p < 0.001, t) 2 = 0.18 


F n . 93) = 59.77, 

p < 0.001, T) 2 = 0.39 


1st half (go — no-go) 
> 2nd 

half(go — no-go) 


1st half: = 17 ± 17% 
2nd half: = 9± 18% 
((86) = 4.62, p < 0.001 


1st half: = 18 ± 17% 
2nd half: = 6 ± 16% 
f (94 ) = 8.46, p < 0.001 


Action x valence x 
time x genotype 


F n . 85) = 5.24, 

p = 0.025, T) 2 = 0.06 


Fn.93) =4.59, 

p = 0.035, T) 2 = 0.05 


A1-(ngw(2nd— 1st 
half)) > 

A1+(ngw(2nd— 1st 
half)) 


A1+: = 8±21% 
A1-: = 22 ± 26% 
( ( 85) = 2.78, p = 0.007 


A1+: = 15 ± 22% 
A1-: = 25 ± 24% 
f ( 93) = 2.16, p = 0.033 



Means ± standard deviations are shown. Only effects that were significant 
in both cohorts are reported. ANOVA was computed with percent correct 
responses as dependent variable and action, valence, time and genotype as 
independent variables. Paired t-tests and t-tests for independent samples were 
performed as post-hoc tests, gw, go to win; gal, go to avoid losing; ngw, no- 
go to win; ngal, no-go to avoid losing. A1+; carriers of the Al allele. AT—; A2 
homozygotes. 

in the separate cohorts [i 7 ^ 179 ) = 9.87, p = 0.002]. Only in 
one cohort there was a statistically significant three-way inter- 
action [action by valence by time; cohort 1: F(i, 85) = 0.42, p = 
0.517; cohort 2: F( 1> 93) = 10.98, p = 0.001] and a time by geno- 
type interaction [cohort 1: F(i, 85) = 3.77, p = 0.055; cohort 2: 
F (1 93) = 6.31, p = 0.014]. 

Statistics regarding reaction times (RTs) of the go responses are 
summarized in Table 4. We computed an ANOVA with valence 
(win/lose) as within-subject factor and genotype as between- 
subject factor. Irrespective of genotype, RTs in the go to win 



condition were shorter than in the go to avoid losing condition 
[cohort 1: 8 5) = 14.06, p < 0.001; cohort 2: F (h 93) = 11.21, 
p = 0.001]. Regarding DRD2/ANKK1 TaqlA genotype, there was 
only a trendwise interaction with valence [F(i. 93 ) = 3.38, p = 
0.069] and a trend for a main effect [F(i, 93) = 3.67, p = 0.058] 
in cohort 2, with the Al carriers being slower in avoiding pun- 
ishment as compared to the A2 homozygotes [((93) = 2.04, p = 
0.046]. Although this nominal effect together with the worse 
accuracy of the Al carriers in the go to avoid losing condition 
(Figure 2) hints at a worse performance of the Al carriers in this 
condition, the interpretation of this result warrants caution as the 
effects were only apparent in cohort 2 and, moreover, participants 
were explicitly instructed to respond accurately, while speed was 
not emphasized. 

To rule out that the genotype effects are not simply explained 
by differences in target detection performance the percentage 
of trials in which subjects responded incorrectly in the target 
detection task (i.e., left when the target was on the right side 
of the display or vice versa) was measured and did not differ 
significantly between genotype groups (Mann-Whitney l/-test: 
cohort 1: A1+: M ± SD = 1± 3%, A1-: M ± SD = 1± 2%, 
z = -0.334, p = 0.738; cohort 2: A1+: M ± SD = 1± 3%, A1-: 
M ± SD = 0± 1%, z = -0.428, p = 0.668). 

Because the TaqlA polymorphism is located downstream of 
the DRD2 gene, the observed genotype effects might putatively 
result from linkage disequilibrium with other DRD2 polymor- 
phisms, including the C957T. We indeed observed an imbalanced 
distribution of the C957T polymorphism (rs6277) among TaqlA 
Al carriers vs. A2 homozygotes numerically in the first cohort 
(X 2 = 4.04, p = 0.132) and significantly in the second cohort 
(X 2 = 25.49, p < 0.001). Moreover, the DARPP-32 polymor- 
phism (rs907094) was unequally distributed in the second cohort 
only (x 2 = 8.53, p = 0.014). In order to rule out confound- 
ing effects, we included the polymorphisms as covariates in an 
additional ANCOVA. The same was done for COMT Vall08/158 
Met (rs4680), because the cohorts were stratified with respect to 
that polymorphism. Importantly, the four-fold action by valence 
by time by genotype interaction for the TaqlA polymorphism 
remained significant [cohort 1: F() 8 2) = 4.63, p = 0.034, cohort 
2: F(i ; 90) = 5.07, p = 0.027] , while there was no effect for C957T 
(cohort 1: p = 0.472, cohort 2: p = 0.810), DARPP-32 (cohort 
1: p = 0.578, cohort 2: p = 0.148) or COMT Vall08/158 Met 
polymorphism (cohort hp = 0.161, cohort 2: p = 0.856). 

DISCUSSION 

The goal of this study was to investigate how a genetic vari- 
ant linked to striatal DA responsivity affects the action/valence 
interaction. To this end, two independent cohorts consisting 
of 87 and 95 healthy participants were genotyped for the 
well-characterized DRD2/ANKK1 TaqlA polymorphism (Grandy 
et al, 1989; Dubertret et al, 2004; Neville et al., 2004) and per- 
formed the previously described valenced go/no-go task (Guitart- 
Masip et al., 2012b, 2013, 2014; Cavanagh et al, 2013; Chowdhury 
et al., 2013). Our results show differential learning performance 
in the carriers of the less common Al allele of the TaqlA poly- 
morphism, which has previously been linked to lower striatal 
dopamine D2 receptor expression. Replicating previous results, 
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FIGURE 2 | Effects of Taq1 A genotype on choice performance in two 
independent cohorts and in the entire sample (data of both cohorts 
combined). Line charts at the left show mean values of correct 
responses (±s.e.m.) in A1 carriers (red) and A2 homozygotes (blue) in 
the first and the second half of trials for all four conditions. Bar plots at 
the right show the differences between mean (±s.e.m.) values of correct 



responses of second half of trials minus first half of trials in A1 carriers 
(red) and A2 homozygotes (blue) for each condition. This score 
represents the four-fold interaction of action by valence by time by 
genotype. Compared to the A2 homozygotes carriers of the Al allele 
showed a diminished learning to withhold an action to receive a reward. 
Post-hoc comparisons via f-test: *p < 0.05. 



participants were, irrespective of genotype, more successful in 
learning active choices in rewarded conditions and passive choices 
in punished conditions, with response inhibition to obtain a 
reward (no-go to win) being the condition most difficult to learn. 
The DRD2 TaqlA polymorphism exerted a modulatory influence 
on learning performance in the no-go to win condition with Al 
carriers showing lower learning rates throughout the experiment. 



It has to be emphasized that, despite the fact that in the present 
study learning curves of the two cohorts differed to some extent 
and initial performance of Al carriers was not identical, we did 
yet observe a replicable attenuation of learning rates in Al carriers 
that was specific to the no-go to win condition, and, importantly, 
the effect was even more pronounced when combining both 
datasets (using cohort as a covariate of no interest; see Figure 2). 
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Table 4 | Statistics on reaction times of correct go responses. 





A1 + 


A1- 












Go to win 


527 ± 128 ms 


535 ± 88 ms 


f (85) = 


-0.36, p = 


= 0.719 


Go to avoid losing 


547 ± 129 ms 


564 ± 117 ms 


f (85) = 


-0.65, p = 


= 0.521 


COHORT 2 


Go to win 


561 ± 100 ms 


534 ± 76 ms 


f (93) = 


1 .48, p = 


0.144 


Go to avoid losing 


583 ± 107 ms 


540 ± 76 ms 


f (93) = 


2.04, p = 


0.046 



Means ± standard deviations are shown. A 1+; carriers oftheAl allele. A1—;A2 
homozygotes. 



It is important to note that there are two potential mechanisms 
by which valence can disrupt the choice of appropriate actions 
in the current task. The first mechanism is implemented at the 
time of the choice and can be seen as "Pavlovian" mechanism 
by which the anticipation of reward or punishment promotes 
action or inhibition, respectively (Dayan et al, 2006; Huys et al., 
2011; Guitart-Masip et al., 2012b). The second mechanism is 
implemented at the time of outcome and is related to the role 
of DA within the striatum. According to a prevalent view in 
reinforcement learning and decision making, DA neurons signal 
reward prediction errors (Montague et al., 1996; Schultz et al., 
1997; Bayer and Glimcher, 2005), in the form of phasic bursts 
for positive prediction errors and dips below baseline firing rate 
for negative prediction errors (Bayer et al., 2007), resulting in 
corresponding peaks and dips of dopamine availability in tar- 
get structures, most prominently the striatum (McClure et al., 
2003; O'Doherty et al, 2003, 2004; Pessiglione et al., 2006). In the 
striatum, increases of DA in response to an unexpected reward 
reinforce the direct pathway via activation of Dl receptors and 
thereby facilitate the future generation of go choices under similar 
circumstances, while dips in DA levels in response to an unex- 
pected punishment reinforce the indirect pathway via reduced 
activation of D2 receptors and thus facilitate the subsequent gen- 
eration of no-go choices in comparable situations (Frank et al., 
2004, 2007; Wickens et al., 2007; Hikida et al, 2010; see Figure 3). 

The effects related to the TaqlA polymorphism observed in 
the present study apparently reflect changes in the learning pro- 
cess, thus likely pointing to the function of DA in the ability to 
flexibly learn go or no-go choices based on the outcomes pro- 
duced by previous actions. Our results are in apparent contrast 
to the effects previously reported in the same task after admin- 
istration of levodopa. In that study, boosting DA levels resulted 
in a decoupling between action and valence that did not reflect 
any changes in the rate of learning (Guitart-Masip et al., 2013). 
Instead, the effects observed in that study boosted the asymp- 
tote reached by the participants that received levodopa. Using 
computational modeling, that effect was best characterized as a 
decreased influence of a Pavlovian control mechanism over the 
instrumental control mechanisms attempting to learn the task 
(Guitart-Masip et al., 2013). Similarly, in older adults, struc- 
tural MRI measures of substantia nigra/ventral tegmental area 
(SN/VTA) integrity have also been linked to improved learning 
and a lower action bias (Chowdhury et al., 2013). One proposed 
explanation for the reduced coupling between action and valence 



in conditions associated with increased DA availability has been 
a likely increase of dopaminergic activity in the prefrontal cortex 
where DA influences the balance between different control mech- 
anisms (Hitchcott et al., 2007). The implication of a prefrontal 
mechanism decreasing the Pavlovian influences on behavior and 
supporting performance of the no-go to win condition in this task 
has been shown in fMRI (Guitart-Masip et al., 2012b) and EEG 
experiments (Cavanagh et al, 2013). It should be noted, though, 
that, in the present study, we did not observe any behavioral 
differences as a function of the COMT Vall08/158 Met polymor- 
phism, which has previously been linked to prefrontal dopamine 
availability (Meyer-Lindenberg et al, 2005). 

Receptor binding studies in vitro and in vivo have shown that 
Al carriers show lower striatal D2 receptor expression (Noble 
et al, 1991; Thompson et al., 1997; Pohjalainen et al, 1998; 
Jonsson et al, 1999; Ritchie and Noble, 2003). On the other hand, 
Al carriers also exhibit increased striatal DA synthesis, possibly 
as a result of reduced autoinhibitory signaling from presynaptic 
D2-type autoreceptors (Laakso et al, 2005). Previous behavioral 
and neuroimaging studies have in fact yielded results that would 
be best explained by parallel reduction of striatal postsynaptic 
D2 receptors and increased presynaptic dopaminergic activity in 
Al carriers, with the latter also resulting in increased DA avail- 
ability both in the striatum and in extrastriatal regions (Kirsch 
et al., 2006; Stelzel et al, 2010; Richter et al, 2013). According 
to those observations, Al carriers would be assumed to show a 
less pronounced decrease of dopaminergic signaling after neg- 
ative prediction errors in the indirect pathway and a shift to a 
more action-oriented behavioral pattern mediated by the direct 
pathway (Figure 3). Such a pattern bears some resemblance to the 
concept of behavioral impulsivity (Tomie et al., 1998; Flagel et al., 
2010, 201 1), and it is noteworthy in this context that the Al allele 
has been linked to risk for impulsivity-related psychiatric disor- 
ders, most prominently alcohol dependence (Noble et al, 1991; 
Comings et al, 1996; Noble, 2003; Eisenberg et al, 2007; Wang 
et al., 2013). However, this does not explain, why Al carriers 
exhibit a relatively specific performance disadvantage in the no-go 
to win, but not in the no-go to avoid losing condition. One possible 
reason would be that a punishment instead of a neutral feed- 
back in the no-go to avoid losing condition might lead to a higher 
prediction error as compared to a neutral feedback instead of a 
reward in the no-go to win condition. Another reason might be 
that, for example, serotonin plays a specific role in punishment- 
related behavior (Daw et al., 2002; Boureau and Dayan, 2011; 
Cools et al., 2011; Guitart-Masip et al., 2012b, 2013; Den Ouden 
et al., 2013) and thus further modulates the performance in the 
no-go to avoid losing condition. 

The investigation of modulators of stereotyped hard-wired 
behavioral responses is of interest to clinicians as it may help 
to develop novel treatment approaches for neurological or psy- 
chiatric disorders. The TaqlA polymorphism is one of the most 
extensively studied genetic variations in neuropsychiatric disor- 
ders with presumed dopaminergic dysfunction, and studies have 
pointed to a potential pleiotropic effect with Al allele carriers 
showing an increased risk for addiction, but a lower risk for 
schizophrenia (e.g., Comings et al., 1996; Noble, 2003; Dubertret 
et al, 2004; Wang et al., 2013; Zhang et al, 2014). Moreover, 
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FIGURE 3 | A model of the putative influence of the TaqlA 
polymorphism on action-valence interaction. DA neurons signal reward 
prediction errors in the form of phasic bursts for positive prediction 
errors and dips below baseline firing rate for negative prediction errors. 
Increases of DA in response to an unexpected reward reinforce the 
direct pathway via activation of D1 receptors and thereby facilitate the 
future generation of go choices under similar circumstances, while dips 



in DA levels in response to an unexpected punishment reinforce the 
indirect pathway via reduced activation of D2 receptors and thus facilitate 
the subsequent generation of no-go choices in comparable situations. A1 
carriers have less D2 receptors and thus would be assumed to have less 
limitation of dopaminergic signaling after negative prediction errors in the 
indirect pathway and a shift to a more action-oriented behavioral pattern 
mediated by the direct pathway. 



studies in healthy humans have suggested a role of the TaqlA 
Al variant in approach-related personality traits (Noble et al., 
1998; Reuter et al, 2006; Lee et al, 2007; Smillie et al., 2010) and 
on motivated interference processing (Richter et al., 2013). The 
relation between the single nucleotide polymorphism (SNP) and 
instrumental learning has also been investigated. Previous studies 
have shown an impairment of the carriers of the Al allele in no-go 
learning to avoid behaviors that yield negative outcomes (Klein 
et al, 2007; Frank and Hutchison, 2009; Jocham et al., 2009). 
However, those studies have only used conditions in which par- 
ticipants had to approach a reward or avoid a punishment. Since 
the interaction between action and valence has a pivotal influence 
on instrumental learning (Guitart-Masip et al., 2012b), such stud- 
ies could not provide information on possible action by valence 
interactions, and the use of the valenced go/no-go-learning task 
with orthogonalized action and valence enables a more precise 
investigation of the contribution of the dopaminergic system in 
behavioral adaptation. 

The TaqlA polymorphism, initially identified to be located on 
the DRD2 gene on human chromosome 1 lq22-23 (Grandy et al., 
1989), is located lOkb downstream of the DRD2 termination 
codon on llq23.1, within coding region of the adjacent ankyrin 
repeat and kinase domain containing 1 (ANKK1) gene (Dubertret 
et al, 2004; Neville et al, 2004). Because the DRD2 and ANKK1 
genes are closely linked (Neville et al., 2004; Ponce et al., 2009), 
it has been proposed that genetic variations in linkage disequi- 
librium (LD) with the SNP might explain the observed relation- 
ship between the TaqlA and alterations of human dopaminergic 



neurotransmission. The SNP is indeed in LD with several poly- 
morphisms on the DRD2 gene (Duan et al., 2003; Ritchie and 
Noble, 2003; Fossella et al., 2006) and one of them is the C957T 
polymorphism (rs6277) for which also modulations on instru- 
mental learning have been observed (Frank et al., 2007, 2009; 
Frank and Hutchison, 2009). However, its influence on dopamin- 
ergic neurotransmission is not clear since in vivo and in vitro data 
are in conflict (Duan et al., 2003; Hirvonen et al., 2004; see also 
erratum by Hirvonen et al., 2004, 2009a,b) and no association was 
found between C957T and DA synthesis capacity in vivo (Laakso 
et al., 2005) and C957T and D2 receptor mRNA expression inpost 
mortem brain tissue (Zhang et al., 2007). When controlling for a 
potential influence of this SNP in our analysis, the effect of TaqlA 
genotype was still significant. We cannot rule out, though, that 
another variant in the DRD2 gene — or perhaps in the ANKK1 
gene — linked to TaqlA might be responsible for the observed 
genotype-related differences in learning rate. 

In order to control for genetic influences of another genetic 
variant known to affect prefrontal DA levels and thereby cor- 
tical Dl receptor stimulation (Gogos et al., 1998; Matsumoto 
et al, 2003; Meyer-Lindenberg et al., 2005; Slifstein et al, 2008) 
we selected our participants to have comparable distributions 
of the COMT Vall08/158 Met genotype. Importantly, the allelic 
distribution of COMT Vall08/158 Met alleles did not differ 
significantly between TaqlA Al carriers and A2 homozygotes. 

It must nevertheless be kept in mind that genetic variations 
within the dopaminergic system do not exert their effects in 
isolation. Frank et al. (2007), for example, observed multiple 
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roles for DA in reinforcement learning when investigating effects 
of the COMT Vall08/158 Met, the DARPP-32, and the DRD2 
C957T polymorphism on reward-based probabilistic learning. 
Even though we controlled for these polymorphisms in our exper- 
iment, we cannot completely rule out gene-gene interactions. Our 
moderately large sample sizes allowed us to examine effects of 
single genetic variants on behavioral outcomes, but the system- 
atic analysis of gene-gene interactions would require substantially 
larger cohorts. In addition to the likely polygenic contribution 
of variants in the dopaminergic system to action by valence 
interaction, also other neuromodulatory transmitters must be 
considered in future studies. 

CONCLUSION 

Our findings provide further evidence for a potential genetic 
basis of individual differences in probabilistic learning and, 
more specifically, suggest that genetically mediated differences 
in dopaminergic neuromodulation not only affect learning per 
se, but also can specifically affect behavioral phenomena like a 
Pavlovian action bias when a reward is expected. With respect to 
future research directed at individual differences in learning, our 
findings should thereby caution researchers to take into account 
the non-orthogonal nature of action by valence interactions. 
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